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Abstract 

We prove that in a general zero-sum repeated game where the first player is more informed than the 
second player and controls the evolution of information on the state, the uniform value exists. This re- 
sult extends previous results on Markov decision processes with partial observation (Rosenberg, Solan, 
Vieille [11]), and repeated games with an informed controller (Renault [10]). Our formal definition of 
a more informed player is more general than the inclusion of signals, allowing therefore for imperfect 
monitoring of actions. We construct an auxiliary stochastic game whose state space is the set of second 
order beliefs of player 2 (beliefs about beliefs of player 1 on the true state variable of the initial game) 
with perfect monitoring and we prove it has a value by using a result of Renault [10]. A key element 
in this work is to prove that player 1 can use strategies of the auxiliary game in the initial game in 
our general framework, which allows to deduce that the value of the auxiliary game is also the value 
of our initial repeated game by using classical arguments. 
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1 Introduction 

Zero-sum repeated games with incomplete information were introduced by Aumann and 
Maschler in 1966 [1] in order to study repeated interactions between two players having a 
different information. The authors also introduced a notion of value for these games usually 
called uniform value and proved its existence for games with incomplete information on one 
side. Mertens and Neyman [4] proved that the uniform value exists for finite stochastic games 
and several works were devoted since then to prove the existence of the uniform value for some 
subclasses of the general model of repeated games. Recently, Renault proved in [10] that the 
uniform value exists in repeated games with an informed controller using an approach based 
on an existence result for dynamic programming problems (Renault, [9]). The existence theo- 
rem in [10] requires that the first player observes the state variable at each stage and controls 
and observes the evolution of the beliefs of the second player on the state variable. 

In the present work, we prove that the uniform value exists in the class of repeated games with 
a more informed controller. Our existence result requires that the first player is more informed 
about the state variable than the second player and also that he controls the evolution of beliefs 
of the second player. A weaker version of our result was conjectured in the conclusion of [10], 
and it was suggested that the proof may be based on an auxiliary game whose state space 
would be the pair of beliefs of both players about the original state variable. We show that 
the analysis requires actually to introduce an auxiliary game whose state space is the set of 
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second order beliefs of the less informed player and provide a set of weaker assumptions than 
those suggested in [10], allowing to deal with imperfect monitoring of actions. 

The paper is organized as follows: In section 2, we describe the general model of repeated 
games and introduce three assumptions that formalize the notion of a more informed con- 
troller. In section 3, we check that several models previously studied in the literature satisfy 
these three assumptions. Section 4 is dedicated to a discussion of the assumptions and a 
precise study of their implications. In addition, we provide a second version of the theorem 
with stronger, but easier to check, assumptions. The last section 5 is dedicated to the proof of 
existence of the uniform value. We introduce there an auxiliary stochastic game with perfect 
monitoring on an auxiliary state variable which represents the beliefs of player 2 about the 
beliefs of player 1 about the state variable of the original game. We prove that this auxiliary 
game has a uniform value using the main theorem of Renault [10] and that player 1 can use 
optimal strategies in this auxiliary stochastic game in order to play optimally in the original 
repeated game. Finally, we prove that player 2 can also guarantee this value by playing by 
blocks, so that both games have a uniform value and these values are equal. 

2 Model 

2.1 General definitions and notation 

For any metric space X, let A(A) denote the set of Borel probability distributions on X. If 
X is a finite set (endowed with the discrete metric) of cardinal \X\, then A(X) is precisely 
the \X [-dimensional simplex. Af(X) C A(X) denotes the probability distributions supported 
on a finite subset of X and 5 X denotes the Dirac measure on x € X. 

A zero-sum repeated game is described by a 8-tuple (K, I, J, g, C, D, n, q), where K is the 
state space, / and J are the action sets for player 1 and 2 respectively, g is a payoff function 
g : K x / x J — > [0, 1], and C and D are the signal sets for player 1 and 2 respectively. 
7r G A j (K x N x N) denotes the initial probability and q : K x / x J — s> A(K x C x D) denotes 
the transition function. 

The game is played as follows: At the beginning of the game, the triple {k\,c\,di) is 
chosen according to the initial probability distribution tt € Af(K x N x N). For each stage 
m > 1, player 1 observes the signal c m and player 2 observes the signal d m . Then both 
players choose actions (i m , j m ) € I x J based on their own past actions and on the sequence 
of signals they observed (i.e. we assume perfect recall). Given the state k m and the actions 
(im,jm), a new triple (k m+ i, c m+ i, d m+ \) G K x C x D is chosen according to the probability 
distribution q(k m ,i m , j m ). The payoff for stage m of player 1 is g(k m ,i m , j m ) and the game 
proceeds to stage m+1. The stage payoffs are not directly observed by the players and cannot 
be deduced, in general, from their observations. The sets of initial signals can be any finite 
subset of N. This generalization is for technical reasons only. Indeed, it will very convenient 
in the sequel to consider this possibly larger set of initial signals in order to have a simple 
way to deal with the recursive structure of the game. 

The information held by player 1 before his play at stage m, called player l's private 
history, is given by 

h T m = (Ci,h,.. . ,Cm-l,im-l,Cm) G N X (/ X C) m ~ l . 
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Similarly, the information held by player 2 is represented by 

h%±(d 1 J 1 ,...,d m -i,j m -i,d m )€Nx(JxD) m - 1 . 

Let Ti 1 (resp. H 11 ) denote the set of all finite private histories for player 1 (resp. of all private 
histories for player 2). We assume the sets K, I, J, C and D are all finite and that the 
description of the model is common knowledge. 

Instead of N, the sets of initial signals will often be denoted by C and D' where C and 
D' are finite subsets of N. We will also write abusively that it G A(K x C' x D'). The 
initial signals will still be denoted by (ci,di). Reciprocally, given finite sets C and D', any 
7r G A(K xC'x D') can be seen as an element of Af(K x N x N) using some enumerations of 
C and D'. The main advantage is that any couple of finite private histories can be embedded 
inNxN via some enumerations. This advantage will become clear in the proof. 

Strategies A behavior strategy for player 1 is a map from private histories H. 1 to probabil- 
ities over / . The set of behavior strategies of player 1 is denoted by S. Every strategy a G X 
corresponds to a sequence {cr m } m >i, where a m is defined on the set of histories up to stage 
m. That is, 

ff m :Nx(/x C) m_1 -»■ A(J). 

Similarly, a behavior strategy r for player 2 is a map from private histories Ti 11 to probability 
distributions over J. The set of behavior strategies of player 2 is denoted by T. Any r 
corresponds to a sequence {r m } m >i, with 

r m : N x (J x D) m_1 -> A(J). 

The initial distribution 7r, the transition function q and a behavior strategy profile (a, r) G 
S x T induce a unique probability distribution over the set of plays KxNxNx(KxCx 
D x I x J)°°, denoted by PJ T . Let EJ T = Ep^ T denote the expectation with respect to the 
probability PJ r . 

Evaluations of the payoff A second component of the model is the way in which the total 
payoff of player 1 is evaluated, in terms of the sequence of stage payoffs {g(k m ,i m ,j m )} m >i. 
The two classical evaluations correspond to the n-stage game and the A-discounted game. In 
the former, the payoff function is the expected Cesaro mean of the stage payoffs of the n first 
stages, i.e. 

7n(vT,0-,r) = E£ T - > _g{km,i m ,j m )\. 

In the latter, the payoff is taken as the expected Abel sum, with respect to the discount factor 
< A < 1, i.e. 

7A(vr, a, T) = Kt [A £ m>1 (l " ^r^gikm^mJm)] ■ 

More generally, one may consider any compact evaluation. That is, for any 6 G A(N*), let 

jo(tt,o-,t) = ^Z T [^2 m>i m g(k m ,i m ,j m )]. (2.1) 
Denote by Tg(7r) the 8-tuple defined above together with the ^-evaluation. 
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The value function For any tt G Af(K xNxN) and any 9 G A(N*), Tg(7r) is known to 
have a value, denoted by vg(ir). It satisfies 

vg(ir) = sup inf 70(71", a, r) = inf sup 70 (tt, <t, r). 
o-eSTeT reT o-gs 

Remark 2.1. These general evaluations will be used in section 5, and we will only need to 
consider probabilities with finite support (i.e. 8 G Aj(N*)J. 

Let a G E be a behavior strategy and let h G % be some finite private history of player 1. 
We denote by cr(h) the behavior strategy of player 1 after the history h. Equivalently, o~(h) is 
the restriction of the map a to the subset of histories beginning with h. In particular, given 
some strategy profile (a, r) 6 E x T and two signals (c, d) G C X D', consider the profile 
(<r(c), r(d)). It may be interpreted as a strategy profile in a game in which the players have 
no initial signals. More formally, for any p G A(K), we will use the following notation: 

j g (k,a(c),T(d)) = 7e(5(fc, c ,d),o-,r), and 70O, cr(c), r(d)) = je(p <g> S^, cr, r). 

With this notation, the payoff can be written as 

1o{k,(T,t) =E 7r [70(fc,er(c),T(d))] = ^ 7e(A;, cr(c), r(d))7r(A;, c, d). 

(fc,c,d)Gi<:xC"xD' 

Alternatively, one can consider the game as having per se infinitely many stages. 

Uniform value The infinitely repeated game is denoted by r oo (-7r). Let us present here 
some important definitions relative to the game r oo (7r). Its value will be called the uniform 
value and denoted by v^i:). 

Definition 2.2. Let v be a real number, 

• Player 1 can guarantee v in r oo (7r) if for any e > there exists a strategy a G E of player 1 
and an integer N G N, such that 

Mn > N, Vr G T, 7 n (7r, cr, r) > v - e. 

We say that such a strategy a guarantees v — e in r oo (7r) and define 

Hooi. 71 ) — sup{v G R I player 1 can garantee v}. 

• Player 2 can guarantee v in r oo (7r) if for any e > f/iere exists a strategy r G T of player 2 
and an integer N G N, smc/j £/iat 

Vn > iV, Vcr G E, 7 n (7r, cr, r) < w + e. 

We say that such a strategy t guarantees v + e in r oo (7r) and define 

Voo (tt) — inf{v G M I player 2 can garantee v}. 

• Ifv^fa) = Uooi^) the uniform value exists and we denote by Uoo^) the common value. 

The existence of a uniform value is stronger than the existence of a limit value (or 
asymptotic value), in the sense that it implies (see e.g. Neyman and Sorin [6, Theorem 1] for 
more general evaluations) 

lim v\ = lim v n = v^. 
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2.2 Model with a more informed controller 

We will consider a particular class of the general model presented above, which generalizes 
both the class of repeated games considered by Renault [10] and the model of Partially 
Observable Markov Decision Processes (see section 3.1). As usual in games with incomplete 
information, we call belief of player 1 at stage m about some random variable £ the conditional 
law of £ given the information held by player 1 at stage m. In the sequel, the first order beliefs 
of a player denote beliefs about the state variable k, and second order beliefs of a player denote 
beliefs about the first order beliefs of his opponent. 

We assume the following three hypotheses at every stage m of the game: 
(al) Player l's first order belief is more accurate than player 2's first order belief. 
(a2) Player 1 can compute the second order beliefs of player 2. 
(a3) Player 1 controls the evolution of second order beliefs of player 2. 

The main result of this paper is to establish the existence of the uniform value under these 
assumptions. Let the formal transcription of (al) — (o3) defined below be denoted by (Al) — 
(A3). 

Theorem 2.3. Let V be a repeated game with a more informed controller, i.e. such that 
assumptions (Al), (A2), (A3) hold. Then the uniform value exists. 

Remark 2.4. It was already pointed out in the literature (see e.g. Mertens [3]) that in games 
with a more informed player, the analysis of beliefs can be restricted to second order beliefs of 
the less informed player. In this work, the definition of more informed is slightly more general 
than the inclusion of signals and a similar reduction is made formally in Lemma 4-4- 

2.3 Formal assumptions 

Let us present here a rigorous transcription of the informal assumptions (al) — (a3). In the 
next section, we will present some of the models which satisfy our three assumptions and to 
which, consequently, Theorem 2.3 applies. 

Let us start with some notations: 

Given some probability distribution /i £ Af(X x Y) over a product, we denote by 

m0*0 = J2^(x,y)- 

y& 

For any random variable £ defined on a probability space (fi, A, P) and T a sub cr-algebra of A, 
let £p(£ | J-) denote the conditional distribution of £ given J 7 , which is seen as a J 7 - measurable 
random variable 1 and let £p(£) denote the distribution of £. 

In the sequel, both functions g and q are linearly extended to A(K) x I x J. 

Assumption (al) can now be formalized as follows: 

1 A11 random variables appearing here take only finitely many values so that the definition of conditional 
laws does not require any additional care about measurability. 
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(Al) Vn > 1, V(<7,t) G S x T, % T (/c„ | fc£X J ) = % r (fc„ | h'J. 

In words, at every stage and given any strategy profile, player 2's information does not contain 
any information about the state variable that is not already contained in player l's informa- 
tion. Assumption (Al) is equivalent to the conditional independence of k n and h 1 ^ , given h^, 
under the probability PJ T . 

For n = 1, this equation does not depend on a and r and it can be reformulated as 

(Ala) ir(c)ir(k, c, d) = n(k, c)vr(c, d), V(fc, c,d) G K x C" x £>'. 

In order to model the players' information about the state variable at stage n, we need 
to define three variables x n , y n and r\ n . Before choosing their first action, the players receive 
signals (c±,di) G C X D' . The (random) variable 

Xl 4 ^(fcj | Cl ) G A(if) 

represents the /jrst order beliefs of player 1 about the initial state. Let x\(c\) G A (K) 
denote its realization, i.e. the beliefs of player 1 once he has received the signal c\ G C . 
Thus, x\(c\) = £ n (ki\ci) and each signal c\ G C occurs with probability tt(ci), so that 
£-k(xi) = J2 cl eC 7r ( c i)^i(ci)- Similarly, define the second order beliefs of player 2, i.e. beliefs 
about player l's beliefs about the initial state 

Ui — £ir(xi\di) G Af(A(K). 

With probability ir(di), player 2's beliefs about player l's beliefs (about the state variable) 
are distributed as follows: 

yi(di) = ^(^1^)^1(01) G Aj(A(K)), 
ciec 

with a slight abuse of notations since we write 7r(ci|di) instead of n(c\ = c\d±) with a sum 
over c G C". Finally, let ??i be the distribution of the second order beliefs of player 2 

m = ^(yi)= E K(di)S y{dl) eA f (A f (A(K))). 

Notice that the Dirac measures involved in the definition of /^(xi) or of £ n (yi) refer to 
different spaces: the former refers to A(K), the latter to Af(A(K)). 

More generally, for some fixed strategy profile (a, r) G S x T, let us denote the first order 
beliefs of player 1 at stage n by x ra G A(if), the second order beliefs of player 2 at stage n by 
y n G Af(A(K)), and the distribution of y n by rj n . 

Definition 2.5. Pwi x n = C v ^ T (k n \ h^), y n = £ P j r (x n |/i^) ; and % = Cf>^ T (y n ). 

Let us illustrate these definitions through the following example. 

Example 2.6. Let K = {k\, /C2} be set of states space, U = {ui, U2} a set of public signals and 
S = {s\, S2, S3} a set of private signals for player 1. Using the notations above, let C = U x S 
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(resp. D = U ) be the set of signals for player 2 (resp. 2). We consider ir G A(K x S x U) 
defined by 




It is more convenient here to use K x S x U but ir can be understood as a probability on 
K x C x D. To simplify notations, let us identify A(K) with [0, 1] with the convention that 
p G A(K) is identified with p{k\). If player 1 receives signal (s\,ui), then x\ = 1. If he 
receives (s 2) ^i) or (fi2,«2); then x\ = \. Finally, if he receives (s%,ui) or (53,^2); then 
x\ = 0. The value of xi depends only on his private signal. 

We now compute the second order beliefs of player 2. If player 2 receives u\ then his beliefs 
about the private signal of player 1 are jgS Sl + + ^4^s 3 , so that 

8 2 2 

yi(u 1 ) = -S 1 + -S h + -6 . 

If player 2 receives U2, then we obtain 

6 6 
yi(u 2 ) = -S h + -S . 

To conclude, player 2 receives each signal with probability \, so that rji is equal to 



2' 

1 , 1 



Assumption (a2) will be split in two parts (A2a) and (A2b). At first, we assume that 
player 1 is able to compute the variable yi, which is a constraint on the initial probability tt 
only. 

(A2a) There exists a map f\ = /f : C — > A(A(K)) such that y\ = /i(ci), 7r-almost surely. 

Assuming (A2a), we can introduce a special class of strategies for player 1 which will be 
needed for the second part of the formal assumption. 

Definition 2.7. If tt G Af(K x N x N) fulfills (A2a), a strategy a € £ is called a reduced 
strategy if it depends on the initial signal c\ in C only through (x±,yi). Let S'(vr) C £ denote 
the subset of reduced strategies. 

The second part of (A2) requires that when player 1 is using a reduced strategy, the 
variable y2 has to be /i.2- measura ble. Formally: 

(A2b) Vvr € A f (K x N x N) satisfying (A2a), Vct G £'(vr), Vr G T, 3/ 2 = i^'^ : N x / x C — )■ 
A(A(ZT)) such that 7/2 = /2(ci, *i, C2), PJ T -almost surely. 
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The introduction of reduced strategies for player 1 is necessary in order to exclude non 
relevant correlations between players (see example 4.1 in section 4). It will be shown in 
Lemma 4.4 and in the proof of the main Theorem that there is no loss in restricting player 1 
to reduced strategies. 

In order to state the last assumption, we reduce the set of initial probabilities. 

Definition 2.8. Let A*j(K x N x N) be the set of probability distributions satisfying (Ala) 
and (A2a). 

Assumption (a3) can now be formalized as 
(A3) Vvr e A*(K x N x N), Vo~ € S'(tt), m is independent of r e T. 

Remark 2.9. Assumptions (Al,A2) imply that the properties (Ala) and (A2a) of the initial 
probability it are preserved by the transition when player 1 plays reduced strategies. Precisely, 
for all (o~, r) with a reduced, the law of (&2, h\, h^ 1 ) under P£ T , seen as an element of Af(K x 
N x N), belongs to the set A*^(K x N x N). We will prove in section 4 that even if the two last 
assumptions (A2b) and (A3) are stated in terms 0/2/2 and 7/2, it is possible to extend these 
properties by induction for y n and r/ n for appropriate strategies. Thus the formal assumptions 
are coherent with the informal assumptions. In particular, player 1 can compute the auxiliary 
variables (22, 2/2 > V2) without knowing the strategy of player 2 and therefore play again a reduced 
strategy at the second stage (i.e. which depends only on (x2,y2))- 

3 Applications 

We present in this section several models which satisfy our assumptions. 

3.1 Partially Observable Markov Decision Processes. 

A POMDP is a one-player game, given by a tuple (K,I,C,g,q,n), where K is the state 
space, / is the action set, C is the signals set, g : K x / — >• [0, 1] is the payoff function, 
q : K x / — > A(K x C) is the transition function and ir is an initial distribution on K x N. In 
the finite framework, the existence of the uniform value has been proven by Rosenberg, Solan 
and Vieille [11] and it was extended by Renault [9] to arbitrary set of actions and signals 
with the additional assumption that all the probabilities appearing in the transition or in the 
definition of strategies have finite support. We will only consider here the finite case. 

Formally, a POMDP can be seen as a repeated game in which player 2 is dummy (i.e. his 
action set J is a singleton). Since player 2 has only one action, his information plays no role 
here. The assumptions (Al) — (A3) hold obviously. 

3.2 Repeated game with a perfectly informed controller. 

The model of a repeated game with an informed controller introduced by Renault [10] fulfils 
our assumptions. In this model, player 1 is perfectly informed of the state and of the signal 
of player 2, in the sense that he can deduce the true state variable and the signal of player 2 
from his signals. Moreover, the transition q is such that player 2 has no influence on the joint 
distribution of the pair made by the state variable and his signal. 

In [10], the sets of initial signals are C = C and D' = D. Formally, the first assumption 
(i.e. that player 1 is perfectly informed of the state and of the signal of player 2) is given by 
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(HA') There exists two mappings k : C — > K and d : C — > D such that, if -E 1 denotes {(k, c, d) G 
K xC x D, jfe(c) = fc, d(c) = d}, then 



7r(E) = 1, and q(k,i,j)(E) = 1, V(k,i,j) e K x I x J. 



The second assumption is formalized by 

(HB') Player 1 controls the transition in the sense that the marginal of the transition q on 
K x D does not depend on player 2's action. For k 6 K, i E I, j £ J, we denote by 
q(k,i) the marginal of q(k,i,j) on K x D. 

Let us check that this model satisfies our assumptions. Assuming that the initial distribution 
7r € A(if x C x D) fulfils assumption (HA'), we have 



We deduce that 7r can be seen as an element of A*j-(K x N x N). Formally, we have to verify 
our assumptions, starting from any initial distribution in A*^(K x N x N). From now on, initial 
signals (c\,d\) belong to arbitrary finite subsets of N denoted by C , D' as in the previous 
section 2 . 

First, note that any stage m > 2, player l's first order belief at each stage is a Dirac mass 
on the current state (i.e. x n = <5fc n ). Thus, adding the signal of player 2 to the signal of 
player 1 does not change the beliefs of the latter, which proves (Al). It also implies that the 
second order beliefs of player 2 can be identified with the first order beliefs of player 2. Let 
7r G A*j(K x N x N), r be a strategy for player 2 and a a reduced strategy for player 1. Recall 
that a is function only of (x\, y\) which are by assumption c\ measurable and that that y\ 
is di-measurable, so that there exist two functions h\ and f\ such that, with probability 1, 
C n (xi\di) = yi = h\(d\) = f\(c\). It follows that 

Fl T (ki,xi,di,i 1 J 1 ,k2,d 2 ) = ■K(d 1 )h 1 (d 1 )(x 1 )x 1 (k 1 )a(x 1 ,h 1 (d 1 ))(i 1 )T(d 1 )(j 1 )q(k 1 ,ii)(k 2 ,d 2 ). 
We deduce that 



Vi(di) = ^il^ArOfcilci.di) = Tr(c 1 \d 1 )Ss i 



fc(ci) 



Y 7r(fci|di)<5* fci . 




y2(di,ji,d 2 ) 



Y KAdih, c 2 \d 1 ,j 1 ,d 2 )S CfSr {k2 \ei,h ,c 2 ) 



Cl,il,C2 



Kr( c i^i,c 2 \di,j 1 ,d 2 )5 Sk 



Cl,il,C2 



^kuxukiM ^( d i) h i( d i)(xi)^(xi,h 1 (d 1 ))(i 1 )q(k 1 ,i 1 )(k 2 ,d 2 )5 Sl 

,4 ,^ ^ ( d i )y i ( d i ) (^i ) ^ (^i > y i ) ) (^i ) , «i ) , d 2 ) 

Efci^fcg^i /i(ci)(gi)g(gi»/i(ci))(n)g(fci,ii)(fc2,d 2 )^ fc2 
Efci ,xi ,fc 2 ,ii /l ( c i ) K ) °" ( X 'l > /l ( c i ) ) (*1 ) 9 (M A ) ( ^2 ) 



2 One may easily reduce the analysis to a smaller set of initial probabilities, but we chose to keep this general 
formulation since the reduction does not really simplify the proofs. 
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where the above equalities hold almost surely whenever the conditional probabilities are well- 
defined. We deduce that yi is only a function of c\ and d,2 which does not depend on r, so 
that the function f 2(01,02) = 2/2(^1,^(02)) proves that assumption (A2) is satisfied. Finally 
the distribution of the random variable 1/2 is equal to 

= Yl P ar( C 1^2)^ 2 ( Cl ,d 2 )- 

Using the preceding result and that the marginal of q on D does not depend on the action 
played by player 2, 772 does not depend on r and (A3) is satisfied. 

Remark 3.1. In a previous work, Renault [8] studied the particular case where the state 
follows a Markov chain f : K — > A(K), player 1 observes the state and both players observe 
the actions. This is easily seen as a particular case of the above model. In a more recent 
work, Neyman [7] proved the existence of the uniform value when allowing for any signalling 
structure on the actions. This last result is not covered by our main theorem since in this 
case, player 1 cannot control player 2 's information about the state variable. 

3.3 Player 1 is more informed about the state. 

In this last paragraph, we assume that actions are observed by both players after each stage. 
Moreover, both players receive a public signal in a set U, player 1 receives a signal in a set S 
and player 2 has influence on the joint distribution of the state and signals in K x U x S. 

Formally, it is a repeated game where C = I x J x S xU , D = I x J xU and the transition 
function satisfies the following two conditions. At first, the signal u is public and the actions 
are observed: 

V(k,i,j) e K X I X J, ^ q(k,i,j)(k',(i,j,u),(i,j,s,u)) = 1. 

k',s,ueKxSxU 

Secondly, there exists a function q from K x I to A(K x S x U) such that 

V(/c, k' ,i, j, s,u) GKxKxIxJxSxU, q(k, i,j)(k', (i,j, s, u), (i, j, u)) = q(k, i)(k', s, u). 

Let us stress out that the transition q in itself depends on player 2 since it has to reveal his 
actions but as we will see our assumptions are still satisfied. It was already noticed in Renault 
[10] that it is too restrictive to assume that the transition is fully controlled by player 1. This 
model is a natural generalization of Renault's model, dropping the (important) condition of 
Player 1 to know the state at every stage. However, it does not allow for imperfect monitoring 
of actions as in the previous examples. 

Let us check that this model satisfies our assumptions. According to the description of the 
model, an initial distribution ir of (k\ ,s\,u\) can be seen as an element of A*j(K x N x N) since 
the signal of player 2 is contained in the signal of player 1. As for the previous example, we 
will start with a general initial probability ir S A*^(K x N x N) and initial signals (ci,di). At 
first, note that apart from the initial signal d\, histories of player 2 are contained in histories 
of player 1, so that assumption (Al) reduces to 

Vn > 1, V(ct,t) e E x T, Cfn T (k n \ h^di) = C^ lr (k n \ h T n ). 
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This property is true for n = 1 by assumption. Let us proceed by induction on n. Assume 
that n > 2 and that the property is proved for n — 1, i.e. that 

x n _i = £ P £ T (£;n-i | h^^di) = £p ?T (/c n -i | 

Again, let <? be linearly extended to A(K) x /. It follows that, 

£pj T (&ra,Sn,Wn|^-l,di,i„_i, j n _i) = <?(z n _l, « n _i) = £pj T (fe n , S n , U n | , Z n _i , j n _i ) , 

(^41) follows then directly by disintegration. Moreover, we deduce that 

K,AK = k\hi) = ^n ^l)(k s U n) ^ ^ 

2^,k'£K Q\ x n-1, 11 )\K ,S n ,U n ) 

The latter proves that x 2 can be expressed as a function of (x\, ii, s 2 , u 2 ) which does not 
depend on r. Recall then that by assumption there exist functions h\ and /1 such that with 
probability 1, we have 

yi = /^(xildi) = /ii(di) = /i(ci). 
If player 1 uses a reduced strategy a and player 2 uses a strategy r, we have 

F* jT (ki,xi,di,ii,ji, k 2 , s 2 ,u 2 ) = Tr(d 1 )h 1 (d 1 )(xi)x 1 (k 1 )a(x 1 ,h 1 (d 1 ))(i 1 )T(d 1 )(j 1 )q(k 1 ,i 1 )(k 2 , s 2 , u 2 ). 
We deduce that 

V2(di,ii,ji,u 2 ) = ^2 P^ T (xi,n,s 2 Mi,ii,n 2 )(5 r]p ^ (jfc2 | a;i!(il)ilJ - liS2)U2) 

X\,i\,S2 

= ^2 lPJ )T ( a; l> i l> s 2|dl»il J «2)*x a (xi,*i,M > U2)- 

Xl,il,S2 

From the previous formula, we deduce 



X) fel 7r(di)/ii(di)(a;i)xi(fci)(7(xi,/ii(di))(ii)r(di)(ji)g(A;i,ii)(s2,U2) 
Eic;,!;,;;^ ^(rfi)/ii(rfi)KK(^^'i. ft i( rf i))K) T ( d i)(ii)9W 1 «'i)( s 2. ^) 

Efci /i(ci)(xi)xi(fci)o-(xi,/i(ci))(n)g(fci,ii)(g 2 ,^2) 
E*;,^.*',^ /i( c i)( I l) il i(* ! 'X a i. /llci))^)?^ . 'UK. "2) ' 

Thus, y 2 does not depend on r nor on Player 1, knowing i±, c\ and u 2 , can compute 

1/2, which proves that assumption (A2) is satisfied. Finally, the distribution of the random 
variable y 2 is equal to 

m= J! P J,T( C l> i l> U 2)<y OT ( C1 ,i 1 ,U2)- 
Cl,il,U2 

Since the function q does not depend on ji, we deduce as above that r\ 2 does not depend on 
r and therefore that (A3) is satisfied. 
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4 Discussion of the assumptions 



In this section, we discuss several implications of the formal assumptions (Al) ,(A2), (A3). 
At first, we show that it is necessary to introduce the notion of reduced strategy in order 
to exclude non relevant correlations between the players. Then, in order to answer to a 
suggestion made in [10], we show that the analysis cannot be made in terms of first order 
beliefs only, and that it is necessary to introduce second order beliefs. Lemma 4.4 shows that 
the value can be expressed as a function of second order beliefs. Then, we prove that player 
1 can compute his first order beliefs, the second order beliefs of player 2 and the distribution 
of these beliefs without knowing the strategy of player 2 as soon as he plays a Markovian 
strategy with respect to the beliefs at each stage. Finally, we give a weaker version of the 
theorem where the assumptions are formulated more directly in terms of the data of the game. 

4.1 Necessity of reduced strategies. 

The introduction of reduced strategies for player 1 is necessary in order to exclude non relevant 
correlations between players as shown in example 4.1 below. It will be shown in Lemma 4.4 
and in the main Theorem that in our model, there is no loss in restricting player 1 to reduced 
strategies. 

Example 4.1. Let K = {kx,ki}, I = {T,B}, C = {a, b}, D = {a,/3} and J any finite set. 
The transition q depends only on the action of player 1 and is described by the matrices 

t ( s kl \ ns kl + is k2 \ 

B \lS kl + \b k J V 5 k2 J • . 
ki k 2 

At each stage (including at the initial stage), the signals of the players are randomly chosen 
independently of the state variable with distribution 

a j3 
a A/6 2/6\ 
b V 2 / 6 V 6 /' 

It is clear that signals do not contain any information on the state variable. However, assume 
that the initial state is k% and that player 1 plays at the first stage action T if he receives the 
signal a and action B if he receives the signal b. The second order beliefs y% of player 2 will 
differ if his initial private signal is equal to a or (3. Since player 1 is not able to compute 
the initial signal of player 2, he is not able to compute the variable y 2 at the second stage. 
Nevertheless, when considering reduced strategies, signals can be omitted and player 1 is able 
to compute the beliefs of player 2 which implies that (A2) is satisfied. 

4.2 Second-order beliefs 

Renault [10] conjectured that the pair of distributions of first order beliefs of both players 
could be sufficient auxiliary variables. We present here an example showing the necessity to 
take into account second order beliefs in the sense that there exist a game and two initial 
probabilities n and it' such that the law of first-order beliefs are the same under tt and tt' 
while the values differ. 
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Example 4.2. We consider again the situation of example 2.6. Recall that K = {ki, that 
there are two public signals U = {^1,^2} available to both players and three private signals 
S = {si, S2, S3} for player 1. The set of signals of player 1 is C = U x S and the set of signals 
of player 2 is D = U. Let ir,ir' G A(K x S x U) be defined by 



U\ U2 




k 2 



and 

Ui U2 ui u 2 




ki k 2 



We will identify A(K) and [0,1] as in example 2.6. The beliefs of Player 2 about the state 
are the same in both cases and are equal to |<53 + \8x. Similarly the beliefs of Player 1 

are + |<5i + in both games. Moreover player 1 observes the signal of player 2, so 
that assumption (A2) is satisfied. Thus, the laws of first- order beliefs are not sufficient to 
discriminate between tt and it'. Let T be the repeated game where K = {fci,^}, I = {T,B}, 
J = {L, R} and payoff g given by 

C i) G i). 

k\ k2 

The average payoff matrix with coefficients (nj o) * s 

Let us prove that Vi(n) and vi(ir') are different. If player 1 receives S3, then his beliefs on 
the state is and thus Top is a weakly dominant action. If he receives s\ or S2, his beliefs is 
1 or I and Bottom is a strictly dominant action. From the point of view of player 2, playing 
Left if receiving u± and playing Right if receiving U2 is a best reply to this strategy. Using 
these strategies, we find that ui(tt) = | and fi(vr') = j|. 

Let us prove that if assumptions (Ala) and (A2a) hold, then vg(Tr) depends only on the 
law of second order beliefs of player 2. 

Definition 4.3. For all tt G A* f (K x N x N), define 

<S>(Tr)±£ n (C n (Ar(A:i|ci)|di)). 
Lemma 4.4. Let tt,tt' G A* f (K x N x N). If $(tt) = $(n'), then vg(ir) = vg(-K f ). 
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Proof. Let (a, r) be a pair of behavior strategies in Tg(ir). It is enough to show that vq(tt) de- 
pends on 7r only through rji = <E>(7r). Recall that X\ := /^-(/cijci) and y\ := C n (^(ki \c±)\di). 
Note that yi is a function of d\, and that X\ is a function of ci. Moreover, by assumption 
(A2a), there exists a map /f : C -> A(A(if)) such that 

/i"(ci) = yi 7r-almost surely. 

Let us construct a reduced version of the game r^(7r) in which player 1 and player 2 are 
constrained to choose strategies that depend only on c\ and d\ through the variables (xi,yi) 
and yi respectively, and keeping the same payoff function. This game has a value since the 
sets of possible values of (xi, y\) is finite and this value is exactly the value of Vq{tt) where tt 
is the joint distribution of (k±, (xi,yi),yi) seen as an element of /S*j(K x N x N). 

The sets of strategies in Tq(t:) (denoted by S'(7r) and T'(tt)) can be seen as subsets of £ 
and T via the previous identification and we will prove that both games have the same value 
and that vq{tt) depends only on r)\. 

Assume at first that r G T'(vr) and a G £ and let /i denote the joint law of (fci, ci, di, xi, yi) 
induced by 7r. By disintegration, we have 



7 e (vr,o-,T)= / 7 e (fei,o-(ci),r(yi))d/x(A;i,ci,a;i,yi), 
JifxNxA(^)xA(A(S:)) 

= / (/ le(ki,o-(c 1 ),T(y 1 ))dC l ,(k 1 \c 1 ,x 1 ,y 1 ) I d/i(ci,xi,yi), 

JNxA(if)xA(A(X)) VJJs: / 

= / / 7e(fci,c r (ci),T"(yi))d£ M (A;i|ci)d/i(ci,xi,yi), 

JNxA(E")xA(A(_fs7) ./.ft" 

(7 e (-,o-(ci),r(yi)),xi) R K(i/i(ci,xi,yi), 



NxA(ir)xA(A(,K")) 

where we used that C^{ki\ci,x\,y\) = C^{k\\c\) since (xi,yi) are ci-measurable and the no- 
tations 7e(., cr(ci), r(yi)) for (7e(/c, cx(ci), r(yi)))k£K G R^ and (•, -} k k for the scalar product 
in R-^. Taking the supremum over all strategies of player 1, we obtain 

sup7 e (vr,o-,r) = / sup (70O, cr(ci), r(yi)), xi} K K(Z/i(ci, xi, yi). 

o-eS iNxA(if)xA(A(if)) cr(ci) 

The supremum inside the integral is achieved by strategies depending only on (xi,yi) since 
these variables are c\ measurable. It means that there exists an optimal strategy in £'(7r), 
which proves 

inf sup 7e(7r, a, r) = inf sup 70(71", <r, r). 

reT'(7r) creS reT'(7r) creS'(Tr) 

Moreover the value of the reduced game depends only on rji since taking the infimum over 
r G T'(vr), 

inf sup 7(71", a, t) (4-1) 
reT' (71-) o-eS'M 



/ inf / sup (7 e (-,(j(xi,yi),r(yi)),Xi} ]K if ) d£ M (xi I yi) 

Ja(A(K)) r(m)JA(K) \<r(xi,m) I 



T(yi)JA(K) \cr(x 1 ,y 1 ) 



(4.2) 
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which depends only on the law of yi, since y\ = £ M (xi \ d\) = C^(xi \ y{). 
Let us prove a dual equality starting with a G £'(7r) and r G T: 



7 (7r,cr,r) = / 7 e (fci,a(xi,yi),r(di))(i/i(/ci,ci,(ii,a:i,yi), 
JifxNxNxA(X)xA(A(A")) 

( / 7e(fci,o-(xi,yi),r((ii))(i/: M (A;i|ci,(ii,xi,yi) ) d/x(ci,(ii,xi, 
xNxA(if)xA(A(K)) VJx / 

= / {le{-,cr{xi,yi),T{di)),xi) K Kd^{c u di,xi,yi) 
JNxNxA(X)xA(A()i')) 

= / (7e(-,o-(xi,yi),r((ii)),xi) M Kd/x((ii,xi,yi) 
JNxA(X)xA(A(iC)) 

= / / {le{-,<y{xi,yi),T{di)),xi) M KdC ll {xi\di,yi)\dfi(di,yi). 

JNxA(A(K)) \Ja(k) J 

For the second equality, we used that C^ki \ c\,di,xi,yi) = C^(k\ \ c\,d\) = C^(k\ | ci) = x\ 
which follows from the fact that (xi,yi) is c\ -measurable and assumption (^41). Taking the 
infimum over all r G T, it follows that 



inf 7(tt, cr, r) = / inf ( / (7 e (-cr(xi, yi), r(di)), xi) r k <i£ M (xi | d\) ) yi)- 

reT J®xA(A(K)) r((ii) \JA(K) J 

(4.3) 



The infimum inside the integral is achieved for strategies depending only on y\ = C^(x\ \ d±) 
since y\ is cii-measurable. We proved that 

sup inf 70(71", cr, r) = sup inf 70(71", cr, r). 
o-eE'M reT o-eS'M reT'W 

Finally, using that T'(7r) C T and S'(7r) C S, it follows that 

v$(tt) = sup inf 70 (7T, cr, r) > sup inf 70(71", cr,r) = ^0(71"), 

vg(7r) = inf sup 70 (ir, a, r) < inf sup 70 (ir, cr, r) = i^tt), 

reT creS reT'(7r) creS 

which proves the equality. Since ^(tt) depends only on rj±, the proof is complete. □ 

4.3 Player 1 can compute his beliefs without knowing player 2's strategy. 

Assumption (^41) can be reformulated as a couple of assumptions (^41a) and (^416) which are 
expressed in terms of ir, i.e. the initial information, and q, i.e. the evolution of the information 
structure for stages m > 2 respectively. 

(Ala) The probability tt G A(K xC'x D') is such that 

V(fc, c', d') £ K x C' x D', ir(c')ir(k, c' , d') = 7r(fc, c / )tt(c / , ci') 



16 



(Alb) There exists a map F from A(K) x / x C to A(K) such that 
V(p,i,j,c,d,k) € A(K)xIxJxCxDxK, q(p,i,j)[k,c,d] = F(p,i,c)[k] ^ t, c, d]. 

Note that (Ala) is equivalent to 7r(/c|c', d!) = w(k\c') for any (k, d , d') such that ir(d, a") > 
0. Similarly (Alb) could be written in terms of conditional probabilities, though we shall 
distinguish events with probability 0. In addition, it highlights the first important consequence 
of assumption (Al): player 1 can compute his beliefs about the state variable (i.e. the 
conditional distribution in the right-hand-side of (^41)) without knowing the strategy, nor the 
signals, of his opponent. 

Proposition 4.5. Assuming (Al), player 1 can compute x n for each n > 1 without knowing 
the strategy of player 2. 

The proof of the Proposition follows directly from the following Lemma. 

Lemma 4.6. Assumptions (Al) and (Ala + Alb) are equivalent. Furthermore, the map F 
from A(K) x I x C to A(K) defined in (Alb) is such that for all n > 2 and for all strategy 
profile (a, r) 

x n = F(x n -i,i n -i,Cn-i), f>l T - almost surely. 

Proof. Using the definition of conditional independence, assumption (Al) at stage 1 is equiv- 
alent to (Ala). It remains to prove that (^41) for n > 2 implies (^416) and the converse. 
Assume that ir fulfils (Ala) and let (a±, t±) G A(I) c x A(J) d be strategies with full support. 
By construction, we have 

C P ^(k 2 ,c 2 ,d 2 | h,c 1 ,i 1 ,d 1 ,j 1 ) = q(k 1 ,i 1 ,ji) G A(K x C x D). 

It follows, using the tower property of conditional expectation and (Ala) that 

Cp^ T (k 2 ,c 2 ,d 2 | ci,ii,di,ji) = q(xi,i 1 ,j 1 ), 

where, by definition, x\ can be written as a function of c±. On one hand, one obtains by 
disintegration 

PZ T (k 2 = k | c 2 ,d 2 ,c 1 ,i 1 ,d 1 ,j 1 )(^2 <l(xi,h,ji)[k,c 2 ,d 2 ]) = q(x 1 ,i 1 , ji)[k,c 2 ,d 2 ]. 

keK 

On the other hand, the conditional law Cp%^(k 2 \ Ci,i±,c 2 ) is characterized by the following 
expression 

Kr( k 2 = k | c 1 ,i 1 ,c 2 )( ^ TT(c 1 ,di)T 1 (d 1 )[j 1 ]q(x 1 (c 1 ),i 1 ,j 1 )[k,c 2 ,d 2 }) 

k,di,d,2,ji 

= ^2 ^i c i^ d i) T i( d i)\h](l(xi(ci),ii,ji)[k,c 2 ,d 2 ). 

di,d,2 ,ji 

Assumption (Al) for n = 2 implies that these two conditional probabilities are equal, which 
in turn implies 

q(x 1 ,i 1 ,j 1 )[k,c 2 ,d 2 ] _ Ed 1 ,d 2 J 1 ^( c i'^i) r i(^)bi]«( ;r i( c i)' i i'.?i)[ A; ' c 2^2] ^ 
^keK Q( x i^iJi)[k, C2,d 2 ] Efc^daJi ^O 31 ' d i)Ti(eZi)[Ji]<?(xi(ci), h,ji)[k, c 2 , d 2 ] 
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whenever the left-hand side is well-defined. Since t\ has full support, this implies that the 
right-hand side is also well-defined in this case and does not depend on di,ji,d 2 . Moreover, 
for all p € A(if), we can choose an initial distribution -k such that vr(xi = p) > 0. It follows 
that there exists a function F such that 

F(p,i,c)[k] - 



whenever the right hand side is well-defined for some (j,d) and extended by (say) 
otherwise. 

For the converse assertion, we already mentioned that (Ala) implies (Al) for n = 1. We 
are therefore allowed to write the following formula for the conditional laws, 

F(k 2 = k | c 2 ,d 2 ,c 1 ,i 1 ,d 1 ,ji) = — -= -. (4.5) 

l^heK Q{ x i> l i> JiRk,C2,d2 \ 

It follows therefore that 

F(k 2 = k | C2,d 2 ,ci,ii,di,ji) = F(x\,ii,c 2 ), P£ T -almost surely, 

and since the right-hand-side is measurable with respect to the history of player 1, we have 
the equality 

F(k 2 = k | c 2 ,ci,ii) = E[P(A;2 = k \ c 2 , d 2 , ci, h, d 1 , ji)\ci, i\, c 2 ] , 
= F(xi,ii,c 2 ), 

= P(A; 2 = k | c 2 ,d 2 ,ci,h,di,ji). 

which proves (Al) and our last assertion for n = 2. Finally the distribution of (k 2 , (c\,ii, c 2 ), (d\,ji, d 2 )), 
seen as an element of A.f(K x N x N), fulfils (Ala). Applying exactly the same argument 
with these new initial signals allows us therefore to conclude by induction on n. □ 



4.4 Player 1 can compute the beliefs of player 2. 

The assumptions (Al) and (-A2) are independent, as shown in example 4.7 below. However, 
(A2) really makes sense only when player 1 is better informed. 

Example 4.7. Let T = (K,I,J,C,D,q,g) be such that player 1 is in the dark and player 2 
is perfectly informed: K = {a,/3}, / and J are finite, C is a singleton {c} and D = K. The 
payoff mapping is anything and the state is randomly chosen at each stage with probability 
(1/2, 1/2). Player 1 observes nothing and player 2 learns the state. It is clear that player 1 's 
signal is less accurate than player 2 's, so that assumption (Al) is not satisfied. On the other 
hand, (A2) is satisfied since player 1 knows the beliefs of player 2 about himself which is (i, i) 
whatever are the signals. 

Under the assumptions (.Al) and (A2), if player 1 plays a reduced strategy, he can compute 
y 2 , the belief of player 2 about his own belief on the state, without knowing the strategy of 
player 2. 
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Lemma 4.8. Assume (Alb) and (A2b), and let it G A*JK x N x N). Then, for all a <E S'(vr), 
there exists a map f 2 = f^ such that for all r £ T 

U2 = 72(^2)) ^ctt ~~ almost surely. 

Proof. It is sufficient to prove that the map f 2 appearing in (A2b) does not depend on r. Note 
that since we assumed (Al), we have x 2 = F(x\(ci), i%, C2) almost surely, where F is defined 
in (Alb). Moreover, the conditional probability 

P£ T (ci = ci,ii = ii,c 2 = c 2 \d 1 ,j 1 ,d 2 ) 

_ n(ci,di)o-(x 1 (ci),yi(di))(ii)T(di)(ji)q(xi(c 1 ,i 1 ,ji)(c2,d 2 ) 
Y.c> 1 ,i> 1 ,c> 2 K(x l ,di)a(x 1 (d 1 ),yi{d 1 )){i' 1 )T{^^^ 

_ ^(ci,di)a(x 1 (ci),yi(di))(ii)q(xi(ci,i 1 ,ji)(c 2 ,d 2 ) 
E^^O', d i)°"Oi (ci),2/i(di))(i / 1 )g(xi(c / 1 ),i' 1 ,ji)(4,d 2 ) 

does not depend on r. There exists therefore a map y 2 (di,j\,d 2 ) which does not depend on 
r, defined by the above expression everywhere it makes sense and arbitrarily elsewhere. Let 
t* be a strategy with full support. Using (A2b), there exists a map f 2 ' a ' T such that 

V2(di,ji,d 2 ) = f1 ,a,T (ci,ii,c 2 ), P£ r * - almost surely. 

The previous computation shows that the conditional law of (ci,i\,c 2 ) given (di,ji,d 2 ) does 
not depend on r. Therefore, if the event 

{d\ = di,ji = ji,d 2 = d 2 ,ci = c\,i\ = h,c 2 = c 2 } 

has positive probability under PJ T , it also has positive probability under P^ T *. We deduce 
that f 2 ' a ' T = f 2 ,(7,T , PJ T -almost surely for all t, which concludes the proof. □ 

Let us now prove that player 1 is able to play a strategy which is Markovian with respect 
to the beliefs. The idea is to prove by induction that if player 1 plays a strategy which depends 
at stage n — 1 only on (x n _i,y n _i), then he can compute the variables (x n ,y n ) at stage n 
and play at stage n a strategy which depends only on (x n ,y n ), etc... Formally, we have the 
following. 

Lemma 4.9. For all tt € A*j(K x N x N), and for any sequence of A(I)-valued measurable 
functions ipi,ip 2 , ■■■ defined on A(K) x Af(A(K), there exists a strategy a such that for all t 
and for all n 

°"(^n) = i>n(xn,Vn), Kr ~ almost surely. 

Proof. We will prove the result by induction. It is obviously true for n = 1 due to the 
definition of A*^(K x N x N). For n = 2, due to the Lemmas 4.6 and 4.8, player 1 can compute 
x 2 and y 2 as a function of h\ independently of the chosen strategy of player 2. However, to 
prove the property for n > 3, we cannot rely on the same argument. It would be tempting to 
say that the distribution of (k 2 , h^, h^ 1 ) belongs to A*^(K x N x N) and to apply the preceding 
argument when starting from this new initial distribution. But this would be wrong since 
this distribution may depend on r. To overcome this problem, it is sufficient to prove that 
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the map f% appearing in (A2b) and the distribution r] 2 appearing in (A3) depend on ir only 
through <&(?t). Indeed, in this case, reasoning by induction, player 1 can compute r\ n as a 
function of T) n -\ and his new signals, x n using Lemma 4.6, and y n using the map given by 
(A2b) which will depend only on %_i and his own strategy. 

Let us prove this assertion. Let tt G A*.{K x N x N), and a G £'(7r) be a reduced strategy, 
which implies that there exists a map ip : K x A(K) — > A(I) such that 7r-almost surely 
ci(ci) = ip(xi,yi). Assumption (A3) implies that 772 is a function of the initial distribution 
tt and <7i only. We denote it by 7/2(71", cr). We now prove that 7/2(71", a) and the map f^ 
appearing in (A2b) depend only on the projection of tt, 7/1 = $(71"), and on the map if). 

At first, given rj\ G Af(Af(A(K))), we can construct a canonical probability W with finite 
support on K x A(K) x Af(A(K)) defined by W(k,p,z) = p k z(p)rj(z). Applying (A3) and 
(A2) in the game r(7f) if player 1 plays a\ = ip, there exists a distribution ^(tt, ip) and a map 
$2'^ ■ K x A(K) such that 7/2 = ( x i,yi) almost surely and t/ 2 has law 772 (7f , 4>) for all r. 
Recall that it is such that &(ir) = 7/1 and that o\ is such that cti(ci) = ip(xi,yi). Note also 
that d\ and xi are conditionally independent given y\ under ir. Therefore, for any n, the joint 
law of (xi, 7/1, «i, ji, C2, ^2) is the same under P£ 1)Tl and under the probability T / where t[ 
is defined as follows: choose d\ using some exogenous lottery such that the conditional law of 
d\ given y\ is the same as under it and then play T\(d{). We deduce that 772 (71", o\) = 7/2 (7f, t/ 1 ) 
and 7/2 = f2^( x iiVi) under the probability P£ n which concludes the proof. □ 

4.5 A stronger version of the theorem 

To conclude this section, let us state a couple of stronger assumptions, which are expressed in 
terms of the data of the game more directly: Player 1 can deduce exactly the signal received 
by player 2 and player 2 can not influence the joint law of (x2,d 2 )- 

Definition 4.10. For all, x,i,j € A(K) x I x J, let qcxD( x ,h j) denote the marginal distri- 
bution on C x D induced by q(x,i,j), i.e. qcxD(x,i,j)(c,d) =^2, k ^x(k)q(k,i,j)(k,c,d). 

Let also H x j the map defined on C x D by 

H x ,i(c,d) = (F(x,i,c),d) G A(K) x D. 

With these notations, we can define a set of assumptions on the marginal of q. The 
assumptions (Al), (A2a) are unchanged and we define (A'2b) and (A'3). 

(A'2b) Player 1 knows the signal of player 2 i.e. there exists a map h : C — > D such that for 
all (k,i,j) e K x I x J, J2 ceC ?( fc > *>i)[ c > M c )] = L 

(A'3) The image probability (f>(x,i) of qcxD(x,i, j) by the map H x ^ does not depend on j. 

Corollary 4.11. Let T be such that assumptions (Al) , (A2a) , (A' 2b) and (A'3) are true. 
Then: 

For allix G A*j(K x N x N), r(7r) has a uniform value. 
The proof of this corollary follows directly from the next Lemma. 
Lemma 4.12. If Al and A2a hold, then A'2b and A'3 imply A2b and A3. 
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Proof. It follows from the definitions and from Lemma 4.6 that 

£>FZ T {x 2 ,d 2 | ci,di,ii,ji) = Cw^ T (F(x 1 ,i 1 ,c 2 ),d 2 \ ci,di,h,ji) 

= ^¥^(H Xl ^(c 2 ,d 2 ) | cx,di,ii,ji) = <p(xi,ii), 

since (xi, i\) is measurable with respect to (c\, d\, and (f>(xi, i\) is the image probability 

of qcxD(%i,h, ji) by the map H xi; i 1 . Therefore, the conditional law of the pair (x 2 ,d 2 ) does 
not depend on the strategy of player 2. Precisely, we have 

£pz T (x 2 ,d 2 | d 1 ,j 1 ) = El T [4>(x 1 ,i 1 ) \ d u ji\. 

Since j% and (xx,ii) are conditionally independent given d\ it follows that 

C^Jx 2 ,d 2 | di,ji) =E* T [<p(xi,ii) | d\\. 

The right hand side does not depend on r, so £pj T (x 2 , d 2 \ d\,j\) does not depend on t and 
jx, and the same is true for the (unconditional) law of (x 2 ,y 2 ). As a consequence, the law 
of y 2 (denoted r] 2 ) does not depend on r which proves (A3). It remains to prove that player 
1 can compute the auxiliary random variable y 2 . Using (A2a) and that a is reduced, i\ can 
be written as a measurable function of {x\,y\) and of an independent random variable u 
uniformly distributed on [0, 1]. Recall that the conditional law of x\ given d\ is yi, so that 

£-Fz T (x2,d 2 I di,ji) =El T [(j)(x 1 ,ii(x 1 ,y 1 ,u)) \ di] 

= / (f>(x,i 1 (x,yi,u)dyi[x]du. 

JA(K)x[0,l] 

Player 1 can compute the conditional law of (x 2 ,d 2 ) given (d%,ji) since it depends only on 
(yi,a). Moreover by assumption (A' 2b), he can deduce d\ from his initial signal c±, so he is 
able to compute y 2 which proves (A2b). □ 

5 Proof of Theorem 2.3. 

The proof is divided into three steps. First, using Lemma 4.4, we define a value function 
v on Af(Af(A(K))) and prove that it is concave and Lipschitz. Secondly, we introduce an 
auxiliary game Q on Af(A(K)) and check it satisfies some (slightly) weakened assumptions 
needed to apply a Theorem of Renault [10]. This implies the existence of a uniform value in 
the auxiliary game. Finally we show that both players can guarantee this value in the original 
game: player 2 by playing by blocks and player 1 by using optimal Markovian strategies in 
the auxiliary game. 

5.1 The canonical value function vg 

In view of Lemma 4.4, it is appropriate to work directly on the set Af(At(A(K))), i.e. 
for any tt,tt' such that $(71") = $(7f') the value of the game is the same. At first, given 
rj G Af(Af(A(K))), there is a canonical way to build a distribution ir such that 4>(ir) = r\. 
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Definition 5.1. Let T = (K, I, J,C, D) be a repeated game. For any n G At (Af (A(_RT))), 
we define D' = supp(?7) C Af(A(K)) and C = D' x (U 2gsupp (^supp(z)) . By definition of n, 
these sets are finite and we can define it G A**(K xC'x D') by 

V(k,p, z)eKx A(K) x A f (A(K)), ir(k, (p, z),z) = r ] {z)z{p)p{k). 

To canonical game T(ir) will be denoted T(r]), and its value vg(w). If ij = 5 Z for some z G 
Af(A(.K")) ; we will use the shorter notations T(z) = T(5 Z ) and vg(z) for the value. 

Informally, the game r(r/) proceeds as follows: r\ is common knowledge, player 2 is informed 
about the realization z of a random variable of law r\ (player 2 learns his beliefs). Then player 
1 is informed about z (his opponent's beliefs) and about the realization p of a random variable 
of law z (his own beliefs). The state variable is finally selected according to p, but none of 
the players observe it. If w = 5 Z , for some z E Af(A(K)), then the set of initial signals for 
player 2 is reduced to a singleton. In this case, player 1 receives a partial information about 
the state, whereas player 2 only knows the joint distribution over the state and player l's 
signal. Using these notations, Lemma 4.4 implies that if it, it' € A*j(K x N x N) are such that 
$(vr) = 3>(7r'), we have that vg(ir) = vg(n') = vq($(it)). 

In order to study the regularity of the canonical value function, let us recall some properties 
of the Wasserstein distance 

Let (Z, d) be a compact metric space and Lip\{Z) the set of 1-Lipschitz functions on Z. The 
function 

d : A(Z) x A(Z) : (fi,u) -> sup [ fdfj,- [ fdv 

f€Lipi(Z)JZ JZ 

is a distance on A(Z) which makes A(Z) compact. Moreover, for all n,v G A(Z) 

d(fi, v) = rain I \y — x\d-ir(x,y), 
ireV(ii,v) JzxZ 

where V(fi, v) is the set of probabilities on Z x Z having for marginals fi and v (see e.g. [14]). 

If / is a bounded measurable function on Z, define / : A(Z) — >• R by f(fi) = f z fdfj,. 
Then 

f E Li Pl (A(Z),d) & f e Li Pl (Z). 

In the following, A(K) is endowed with the ^i-norm induced by M. K and A(A(i^)) is 
endowed with the Wasserstein metric d induced by the metric space (A(K),£i). 

Lemma 5.2. Letr) 6 A/(A/(A(1T))) and z G Af(A(K)). Thenv e (f]) is linear on A/(A/(A(iT))) 
and the mapping on A(A(K)) ; i)g(z) is 1-Lipschitz for the Wasserstein metric d. 

Proof. The first assertion is immediate since by definition both players learn the realization 
of rj. Let z,z' G Aj(A(K)). By definition of the Wasserstein distance, there exists [i G 
A(A(if) x A(K)) such that the first marginal is z, the second is z' and 

d(z,z')= \\p-p'\\id(j,(p,p). 
JA(K)xA(K) 

We denote by C^i^plp') the conditional law of p given p'. 
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Let a £ E be a behavior strategy for player 1 in the game T(z). As in Section 2.2, cr(p) 
denotes the strategy of player 1 conditionally on the signal p. Let us construct a general 
strategy for PI as follows. Let (fi,P) = ([0,1], dx) be the auxiliary probability space 3 that 
will be used as a "tossing coin" . The classical representation result of Blackwell-Dubins (see 
[2]) asserts that there exists a jointly Borel-measurable map : Q, X A{A(K)) i-> A(K) 
such that for all v € A(A(K)), <j)(-,v) is a ^-distributed random variable. Therefore, the 
map cr'(uj,p') = a((p(uj, fi(p\p'))) defines a general strategy which is equivalent to a behavior 
strategy by Kuhn's theorem. It follows that 

7o(*V,t)= f 1 e(p',<7'{u, P '),T)dz'(p')®dF(u;), 

JA(K)xQ 



[ (I 7e(p^(H^,Kp\p'))),r)dF(Lo)) dz'(p'), 
Ja(k) \Jn J 

\ \ le{p,cr(p),T)dC^(p\p))dp(p), 

Ja(k) \Ja(k) J 

r re(p',cr(p),T)dn(p,p'), 

A(K)xA(K) 



where the last equality follows from dz'(p') = dp(p'). Recall that by assumption g takes values 
in [0,1]. Consequently, 7(p, <r(p),r) S [0,1], \/p,a,r. Hence 

\le{z',a',T) - j e (z,a,T)\ < / \j g (p, a(p), r) - -y e (p' , a(p), r)\dfi(p,p'), 

JA(K)xA{K) 



< / \\P-P'\\id^(p,p'), 

JA(K)xA(K) 
= d(z,z'). 

It follows that \vg(z) - v e (z')\ < d(z,z'), for any z,z' £ A f (A(K)). □ 

Note that usually, the underlying space is A{K) with discrete metric on K and, in order 
to prove that the value is 1-Lipschitz, we can use the same strategy in T{z) in T(z'). Here, 
we cannot use directly a. The state space is Af{A(K)) with the norm 1 on A(K), and two 
states may be close while having disjoint supports. Therefore an optimal strategy a in T(z) 
may have no sense in z' . The idea behind the above proof is to construct, given a in T(z), a 
strategy a 1 in T(z') which behaves in z' like a in z. 

Example 5.3. Assume that K = {/ci,/c2i and let z = <5i and z' = h5i_ c + i<5i be two 

2 ^2 ^2 ~ l ~ 

initial distributions in Af(A(K) (where we identified A(K) and [0, 1]^. A strategy in T(z) is 
defined only at ^ since it can be modified elsewhere without altering the payoff. Therefore an 
optimal strategy a in in T(z) can play anything in\ — e and in \ + e since no regularity for 
a is required. The good way to use the proximity between z and z' is to always play as if the 
initial distribution was \. Here we have to define a' such that for all z G A(K), o~'{z) = c(^). 



3 Using a continuum of alternatives is clearly unnecessary but allows to simplify the proof. 
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Lemma 5.4 (Splitting procedure). The mapping vg(z) is concave on Af(A(K)). 

Proof. We follow the same scheme as for games with incomplete information on one side (see 
e.g. [5] Corollary 1.3 p. 184). Let A G [0,1] and let Y be a random variable with values in 
{0,1}, such that P(Y = 0) = A. Let z,z' G A f (A(K)). The random variable P is selected 
according to the distribution z if Y = and z 1 if Y = 1, the state variable fci is finally selected 
according to p if P = p. Compare now the two following situations: on one hand, the game 
with initial signals (Y, P) for player 1 and nothing for player 2 and on the other hand the 
game with initial signals (Y, P) for player 1 and Y for player 2. These two distributions of 
initial signals and states fulfill our assumptions and it's clear that the value of the second is 
less or equal than the value of the first for any evaluation 9 G Ay(N*) since the set of behavior 
strategies of player 2 in the second game is larger than in the first game. Translating this 
inequality using v, we deduce directly 

ve(5\ z +(i-\)z>) > ve{\5 z + (1 - \)8 Z >) = \v e (z) + 1 - Xv e (z), 
which proves the Lemma. □ 

5.2 Auxiliary game Q 

Let X = Af(A{K)) be the state space, which corresponds to player 2's belief about player 
l's belief about the current state. It is a convex relatively compact subset of a normed vector 
space and we are going to express the auxiliary game and the recursive formula on this state 
space. 

Let G be the stochastic game defined by 

• the state space X = Af(A(K)), 

• the action space A = {/ : A(K) — >• A (X), measurable} for player 1, 

• the action space B = A (J) for player 2, 

• the payoff function G : X x A x B — > [0, 1] defined, for any z G X by 

G(z,a,b)= ^2 ^2 b(j)a(p,i)g(p,i,j))z(p), 

pGsupp(z) (i,j)£lxj 

where supp(z) stands for the support of z, 

• the transition function I : X x A x B — > Af(X) is defined as £(z,a,b) = &(Q(z,a,b)), 
where Q(z,a,b) G Af((K) x {A(K) x C) x (D)) is the induced joint distribution of 
(&2, (p,ii,C2), (ji,c?2)) in the canonical game T(5 Z ) where players play at the first stage 
o~\ = a and t\ = b. The sets C, D, K and supp(z) being finite and using assumptions 
(Al) and (^42), we may consider Q as an element in A*f(K x N x N). 

Let us recall the definition of Choquet order on Af(X). 

Definition 5.5. The order < on Af(X) called (reversed) Choquet order is defined by the 
relation 

[i < v <4> For all continuous concave function on X, f(fi) < f{v)- 
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We aim to apply a weakened version of Renault [10] to the game G, thus let us first recall 
the hypotheses of the Theorem as they appear in the original article. 

Hypotheses 5.1. 

HI) The map £ does not depend on b. 

H2) X is a compact convex subset of a normed vector space, 

H3) A and B are convex compact subsets of some topological vector spaces, 

H4) (a i-> G(z,a,b)) is concave upper semi- continuous V(z,b) G X x B and (b t— > G(z,a,b)) is 
convex and lower semi- continuous V(z, a) G X x A. 

H5) There exists a subset C of 1-Lipschitz functions containing 0(1,0) such that for all f in C, 
a G [0, 1], the function cft(a, f) is in C, where <j)(a, /) is defined by 

Vz G A f (X) </>(a, /)(*) = sup min \aG(z, a, b) + (1 - a)f(£(z, a))} . 

aeA beB J 

H6) The mapping a i— > £(z, a) is concave for the Choquet order and continuous. 

H7) (Splitting assumption) Let z be a convex combination in Aj(A(K)), z = J2s=i^s z s an d 
(cl s )s€S be a family of actions in A s . Then there exists a € A such that 

£(z,a) > y^^s£(z s ,a s ) and minG(z,a,b) > > X s min G(z s ,a s ,b). 

The main consequence of assumption (A3) is that player 2 cannot influence the transition 
in the auxiliary game so the map £ does not depend on b, i.e. 

V(z, a) € X xA, V6, b' G B, £(z, a, b) = £{z, a, b'). 

Thus (HI) is satisfied and from now on, we will work under the shorter notation l(z,a) for 
l(z, a, b). 

The hypotheses (H2, H3, H4, H6, H7) ensure the application of Sion's theorem in several 
steps of Renault's proof. Here they are not all satisfied since, for example, the set A is 
not compact. However, it is well known that adding some geometrical hypotheses allows to 
weaken the topological assumptions in Sion's theorem (see, for instance, Proposition A. 8 in 
Sorin's monography [13]). For instance, if A is a convex set, B is a compact convex subset of 
a topological vector space, (a \-t G(z,a,b)) is concave V(z, 6) G X x B and (b h-> G(z,a,b)) 
is convex and lower semi-continuous V(z, a) G X x A, Sion's result applies to the one-stage 
game: the game Qi(z) has a value. They can be replaced without altering the proof by the 
following hypotheses. 

Hypotheses 5.2. 

H2') X is a relatively compact convex subset of a normed vector space. 

H3') B is a convex compact subset of a topological vector space, A is a convex set. 

H4') (a i — y G(z,a,b)) is concave \/(z,b) G X x B and (b i— > G(z,a,b)) is convex and lower semi- 
continuous y(z, a) G X x A. 

H6') The mapping a i— > £(z, a) is concave for the Choquet order. 
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Assumption (H2') is satisfied since the Wasserstein distance can be extended to a norm 
on the space of finite signed measures. Moreover assumptions (H3') and {HA') are clearly 
satisfied. Therefore, we need to prove (H&) and {HI). 

Lemma 5.6. The game Q fulfills H& and H7. 

Proof. Let z be a convex combination in X, z = Y2s=i ^ sZs anc ^ ( a s)ses be a family of actions 
in A s . Denote p(z s ,a s ) G Aj(A(K) x I) the joint law induced on A(A") x I by (z s ,a s ). By 
disintegration, there exists a £ A such that p(z,a) = Ylses ^sfJ-(z s , a s ). A first, note that 
Q(z, a, b) = J2ses X sQ{z s , a s ,b). 

Given (z, a, b), we consider the canonical game T(z). In this game, a pair (k,p) is chosen 
according to the probability ir G Af(K x A(K)) defined by ir(k,p) = p k z(p) for all (k,p) G 
K x A (A"). Then, player 1 receives the signal c\ = p and player 2 receives no initial signal. 
We associate to (a, b) a pair strategies for the first stage (o"i,ti) by u\{p) = a(p) and ti = b. 
Then, Q(z, a, 6) denotes the joint distribution of (k 2 , (p, ii, C2), (ji, d 2 )). Since the conditional 
law of (k 2 ,c 2 ,d 2 ) given (p, h,ji) is q{p, it follows that Q(z, a, b) is bilinear with respect 

to (p(z,a),b) (with abusive notations). 

Let p = Q(z,a,b) (resp. p s = Q(z s ,a s ,b)) and d = (p,ii,c 2 ) (resp. d' = (ji,d 2 )). Let 
C = (L) se s su PP( z s)) x / x C and D' = J x D. By construction, 

£(*,a) = $(p) = £p(£p(£p(k 2 \ d) \ d')) = ^ p(d')5 Cp {c p {k?\c>)\d>)- 

d'eD' 

Using Lemma 4.6, we have the following equality p-almost surely 

C p (k 2 I p,k,ci) = F(p,ii,c 2 ). 

This implies that 

C p {Cp{k 2 I d),d!) = C p {F{d),d') G A f (C" x D') 

where C" = F{C). A similar equality holds with p s instead of p for all s G S. By definition 
of l(z, a), we deduce that 

£(z,a) = Cp(C p (F(d) I d')) = *(£p(F(d),d')), 

where is the disintegration map defined by 

* : A(C" x D') A f (A(C")) : m -> £ rn{d')5 Cm{c ^ 

where C m (d'\d!) denoted the conditional law of c" given d' . It was proved in Renault [10] 
(Lemma 4.16) that \E' is concave for the Choquet order on Aj(A(C")). However, C" being a 
finite subset of A(A), A{C") is identified as a compact convex subset of X. It follows easily 
that the convex order on Aj(A(C")) coincides with the order induced by the convex order 
on A f (X). 

We conclude that the first part of H7 holds since 

l(z,a) = *C£^sC Ps (F(d),d')) >J2^(£ Ps (F(d),d')) =J2^l(z s ,a s ). 

ses ses ses 



26 



For the second part of H7, it is sufficient to note that (again with abusive notations) /i(z, a) i— > 
G(z, a, b) is linear so that for all b G B 

G(z, a,b) =y~] X s G(z s ,a s , b), 

which implies the result. Finally, in case z s = z for all s, the same arguments also imply 
(HQ') since in this case one can choose a = X^ses ^ fl s i n t ne above proof. □ 

The proof of the following proposition follows from Proposition 3.21 in Renault [10]. 

Proposition 5.7. Assuming (HI, H2', H3' , HA', HQ', H7), then for any 6 G Aj(N*) and any 
T) G Af(X), the game Qe(ri) has a value wg(rf) such that 

\/z G X,wg(z) = sup mm {9 1 G(z, a, b) + (1 - 9i)w e +(£(z, a))} , (5.1) 

a£A b£B 

= minsup{0iG(z,a,&) + (1 - 0i)w e + (£(z, a))} , (5.2) 
fees aeA 

where 6 + is defined by 6^ = ^ t+1 g /or t > 1 whenever Y^m>2 > and is defined 

arbitrarily otherwise. Moreover, in Qq(t]), player 1 has e-optimal Markov strategies for all 
e > and player 2 has optimal Markov strategies. 

In order to prove the last assumption (H5), we first prove that the value of the game 
Qg is equal to the canonical value function v$. Since we proved that the canonical value is 
1-Lipschitz, it will imply using the previous Proposition that the set of functions C = {ye, 9 G 
A/(N*)} satisfies (iJ5). 

We now prove that the value functions of both games are the same. The proof is classic 
and consists to show that both families of functions are linked by the same recursive formula. 

Proposition 5.8. For all 9 G Aj(N*) and for any z G X, w$(z) = vg(z). 

Corollary 5.9. The game Q fulfills (H5). 

Proof of Proposition 5.8. Notice first that V\(z) = W\(z) for all z. This comes indeed almost 
from the definition 

i)\(z)= sup inf / g(p,ai(p),b)dz(p) 
cti:A(JC)->A(J) beA(J) J A(K) 

= sup min G(z, a, b) 
aGA 6eA(J) 

= min sup G(z, a, b) 

6gA(J) aeA 
= wi(z). 

It is enough to prove that w and v satisfy the same recurrence formula. We will prove that v 
satisfies the recurrence formula in Q, i.e. 

vg(z) = sup min 9\G(z, a, b) + (1 — 9\)vq+ (l(z, a)) 

a£A b£B 

= min sup 9\G(z, a, b) + (1 — 9i)vg+(£(z, a)). 

b€B aGA 
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We prove the recursive formula by induction on the greatest element in the support of 9. If 
9 = Si, it follows from the preceding equality. Fix now n > 2, and assume that the proposition 
is true for every 9 supported by {1, ...,n — 1}. Let z G Af(A(K)). We first prove that player 
1 can defend in Tg(z) the quantity 

minsup (9iG(z, a, b) + (1 — 9i)v e + [£{z, a))) . 

beB aeA 

Using the canonical representation V, vg{z) = ve{-K) where ir G Af(K x A(K) x Af(A(K))) 
is defined by 

V{k,p,x) eKx A(K) x A f (A(K)), Tr(k,p,x) = p(k)z(p)t x=z . 

Consider the game Tq(tt). Let e > and r be a strategy of player 2. Denoting by b the law 
induced by ti, let a* £ A an action which realizes the supremum up to e in the expression 

9iG{z,a,b) + {l-9i)v 9+ {Z{z,a)). 

Let a* be an e-optimal strategy in the game Tg+(Q(z,a*,b)). Define then a by o\ = a* and 
for all n € N*, h T n = (p, ii, c 2 , i n -i, On), cr„(/t£) = (T*_ 1 (c / , h 1 ^) where c' = (p,n,c 2 ) and 
h n-i = (*2,c 3 , ..i n -i,Cn). We have 

79 (/i, ^ r) = 5iG(z, a*, 6) + (1 - ^) 7e+ (Q(z, a* ,b),a* ,r + ), 
where r + is a continuation strategy. Precisely, for all n G N*, r^"_ 1 (d / , /i^^) = T n (h^) with 

fr^i = (j2,d3, --,in-i,rfn), ^r/ = K^n-i) and d ' = (h,d 2 ) is the "signal" for player 2 given 
by Q(z,a*,b). 

Therefore, <r* and r + can be seen as behavior strategies in a new game with initial signals cor- 
responding to the past history in the original game and since a* is e-optimal in T(Q(z, a*,b)), 
we have 

79 (/i, a, t) > 9iG(z, a*,b) + (l- 9 1 )v e+ (Q(z, a*,b)) - e 

> sup9iG(z,a,b) + (1 - 9i)v e +{Q(z, a, b)) - 2e 

aeA 

= sup 9iG(z, a, b) + (1 - 9i)v e + (£(z, a)) - 2e 

aeA 

> minsup 9\G(z, a, b) + (1 — 9i)v g +(£(z,a)) — 2e. 

beB aeA 

It follows that vq(z) > min bgB sup a€j4 9iG(z, a*, b) + (1 — 9i)vg+(£(z, a)) by sending e to zero. 

Let us show that player 2 can defend sup aeyl mm beB (9iG(z, a, b) + (l — 9i)vg+(£(z, a)) in r(/x). 
Fix a strategy a of player 1 and let a = ui, there exists b* £ B achieving min b G(z, a, b). We 
also choose r* an optimal strategy for player 2 in the game T g +(Q(z,a,b*)). This defines a 
strategy r such that 

7 >,r) = 9iG(z,a,b*) + (1 - 0ih& ( * ,a,b *VV*) 
< 9 ± G(z, a, 6*) + (1 - 0i)u„ + (Q(z, a, 6*)) 
= 0iG(z, a, 6*) + (1 - 0i)u 9+ (*(z, a)) 
= min6»iG(z,a,6) + (1 - 9 1 )v e +(£(z,a)). 

beB 

Thus ^(z) < sup a€j 4 min feGB (^iG(z, a, 6) + (1 — 0i)£0+(£(z, a)). Finally, since the maxmin is 
always smaller than the minmax, all the intermediate inequalities are equalities. □ 
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5.3 Existence of the uniform value 

Let us at first recall the first main result proved in [10] which holds under our set of weakened 
assumptions. 

Theorem 5.10 (Renault (2012)). Assume that HI, H'2, H'3, H'A, H5, H'6, HI hold. Then 
for every initial distribution r/ G Af(X), the game has a uniform value w*{rf). Moreover 
player 1 can guarantee w*(rj) with a Markov strategy: 

Ve > 0, 3a G Z M ,3N G N, ViV > iV Vr G T, i N (r),a,T') > w*{rj) - e. 

and we have w*(rj) = inf n>1 sup m>0 w mtn (r]). 

In order to conclude the proof, we show that both players can guarantee 

v*(tt) = inf sup Vra^TT), 
n>l m>0 

where v mjn (ir) = ve mn (ir) and 6 m ^ n is the uniform law between stage m and m + n. 
The game Q{z) satisfies assumptions HI, HI' so it has a uniform value given by 

w*(z) = inf sup w mtn (z). 

n>l m>0 

And by proposition 5.8, the value in Q and in the reduced game are equal, so if it G A*j(K x 
N x N) we have 

v*(ir) = inf supv m , n (ir) = inf sup u m ,„($(vr)) = inf sup w mjn ($(jr)) =w*($(n)), 

n>lm>0 n>lm>0 n>lm>0 

Thus player 1 can guarantee v*(ir) in ^(<I>(7r)) with a Markov strategy. Let us check that he 
can guarantee v*(ir) in the game r($(7r)) or equivalently in r(-7r). 

Proposition 5.11. Any Markovian strategy a of player 1 

ifi Qoo{z^) induces a strategy o~ in 

Lco(^) guaranteeing the same amount. 

Proof. Let a be a behavior strategy in Goo(z). Let us describe the strategy a. Player 1 plays 
at the first round in T OQ (z) the mixed action a\{z){p) where p is his initial signal. Then, at 
round n, he plays the mixed action a n (y n )(x n ). That this strategy is a well-defined strategy 
follows from Lemma 4.9. 

It remains to prove that this strategy guarantees the same quantity as a. Let us fix n G N*, 
we will prove that there exists a best reply r to a in T n (z) which can be seen as a strategy f 
in Gq{z) and such that 

a, t) = %(z,a,r). 

We will proceed by backward induction. Let us fix a best reply r to a in Tq(z). We will 
construct a strategy r which depends at stage m on h 1 ^ only through y m . Recall that a is 
fixed so that y m (h^) can be computed by player 2. At first let us replace r n by 

T n (y n ) =E P j t [t(/i£ / ) | y n \- 
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Note that this conditional expectation depends on the strategies o~,t up to stage n — 1. Let 
us prove that the payoff at the last stage n is not modified. 

\E ¥ * T [g(k n ,i n ,j n ) | h 1 ^ hi 1 ]} 
[71 (x n , a n (y n , x n ), T n (hn))] 
[E P j T [7i(x„,o- n (2/ n ,aj n ),r n (/t^ J )) I h^]} 

71 (x, cr n (y n , x),T n (h I n I ))dy n [x}] 



Ef,z T [g(k n ,i n ,j n )] =E P z r 
= Epz 

err 
= Epz 



E ff 

Ep 

Ep 

Ep 
Ef 



ni x), T n (hi r ))dy n[x] I 2/ti]] 

7i(x,cr n (y n ,aj),E P j T [r n (/i^) | y n ])dy n N] 

71 (x, cr„ (y n , x) , T n (y n ) ) dy n [x\ } 
]{k n , i n , jn)] ■ 



' o-,(T 1 ,...,T n _i, ! rn) 

The above equations show that the expected payoff at stage n when player 2 is playing the 
best reply (77, r n _i, r ra ) against a is a function of <r and of the law of y n . Assume now that 
at step m, we have proved that there exists a best reply to a of player 2 such that the sum of 
expected payoffs for the stages m + 1, ...,n is a function of a and of the law of (y m +i, -,2/71) 
only. We can replace T m (h^) by Tm(ym) = Epj T [r m (/i^) | y m ] without modifying the expected 
payoff of stage m with the same argument as above. Using assumption (A3), Lemma 4.9 and 
the definition of a, the law of (y m +\, ■■.yn) is not modified by this operation which proves that 
this modified strategy is still a best reply to a. □ 

Secondly, we prove that Player 2 can guarantee v*(tt) by splitting the stage in blocks and 
playing on each block separately since he has no influence on the transition. The following 
results are quite similar to the corresponding ones proved in Renault [10] and are reproduced 
here since their proofs are very short. 

Lemma 5.12. For every ir £ A* f (Kx C'xD'), n > 1 and m > 1, V 77, 



• 'ill} ^ T m _|_i , . . . , T m j rn 



.., T, 



such that the strategy 77, ...,r m , ...,T m+n of player 2 is optimal in the game T m ^ n . 

Proof. Let tt G A(KxC'xD'), n > 1 and m > 0, and 77, ...,r m such that 73 : D' x (J x D) 1 ^ 1 
A (J). We define T* the subset strategies of player 2 which start with 77, ...,r m and we consider 
the game with the evaluation # mjn and the set of strategies S and T* . It can be seen as the 
mixed extension of a finite game, thus the value exists and will be denoted f^ n (7r). Since 
the set of strategies of player 2 is smaller than T, we have v m>n (it) <v* mn (it) . But using the 
same method as for proving the recursive formula of Proposition 5.8, for any a, we can build 
a strategy which defends v m n (tt). Both values are therefore equal and any optimal strategy 
in the restricted game satisfies the conclusion of the lemma. □ 

Proposition 5.13. For every tt E A*j(K x C x D'), player 2 can guarantee v*(tt) in the 
game r oo (7r). 

Proof. We prove that for all n G N, Player 2 can guarantee the payoff sup m>0 v mjU (p). Let 
n 6 N be a number of stages, then for each L € N we split the game of length nL in L blocks 
of length n: B\, B^. We define the strategy r* by induction on the block. 
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Let r be an optimal strategy in ri n (7r) then we set r* = Tj for all i € {1, ..,n}. Once 

jj 

we have constructed r*,...,r^ for some 1 < I < L — 1, we define the game r^ L+ln (7r) 

where the player 2 has to play r* for all i < nl. We have v^ L+ln (ir) = VnL+i^i^) using 

the preceding Lemma. Let r be an optimal strategy in r^ +ln (7r) and set r* = for all 
i € {nL + 1, (n + 1)L}. We have 

j / Ln \ . L-l / (ci+l)n ^ 

7Ln(cr,T*) = — E^ T , I ^ g(k m ,i m ,jm) ] = X] I X 9(km,i m ,jm) 

"" \m=0 / d=0 \m=dn+l 



L-l , L-l 



< t V] w dn+ i n (7r) < i y~] sup u m , n (7r) 

L ' — ' L — * m >n 

d=0 d=0 - 

< sup v m:n (ir). 

m>0 



The payoff being bounded, we deduce that this strategy guarantees v m>n (Tr). Finally, Player 
2 can guarantee the minimum on n € N, inf nGN sup m>0 v m ^ n {Tz) = v*(ir). □ 

Since each player can guarantee v* (ir) , the game has a uniform value given by v* (ir) which 
concludes the proof of Theorem 2.3. 
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